Hierarchical Information-theoretic Co-clustering for High Dimensional Data

نویسندگان

Yuanyuan Wang

Yunming Ye

Xutao Li

Michael K. Ng

Joshua Huang

چکیده

Hierarchical clustering is an important technique for hierarchical data exploration applications. However, most existing hierarchial methods are based on traditional one-side clustering, which is not effective for handling high dimensional data. In this paper, we develop a partitional hierarchical co-clustering framework and propose a Hierarchical Information-Theoretical Co-Clustering (HITCC) algorithm. The algorithm conducts a series of binary partitions of objects on a data set via the Information-Theoretical Co-Clustering (ITCC) procedure, and generates a hierarchical management of object clusters. Due to simultaneously clustering of features and objects in the process of building a cluster tree, the HITCC algorithm can identify subspace clusters at different-level abstractions and acquire good clustering hierarchies. Compared with the flat ITCC algorithm and six state-of-the-art hierarchical clustering algorithms on various data sets, the new algorithm demonstrated much better performance.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

High-Dimensional Unsupervised Active Learning Method

In this work, a hierarchical ensemble of projected clustering algorithm for high-dimensional data is proposed. The basic concept of the algorithm is based on the active learning method (ALM) which is a fuzzy learning scheme, inspired by some behavioral features of human brain functionality. High-dimensional unsupervised active learning method (HUALM) is a clustering algorithm which blurs the da...

متن کامل

Information Theoretic Clustering of Sparse Co-Occurrence Data

A novel approach to clustering co-occurrence data poses it as an optimization problem in information theory which minimizes the resulting loss in mutual information. A divisive clustering algorithm that monotonically reduces this loss function was recently proposed. In this paper we show that sparse high-dimensional data presents special challenges which can result in the algorithm getting stuc...

متن کامل

A Hierarchical Probabilistic Model for Co-Clustering High-Dimensional Data

We propose a hierarchical, model-based co-clustering framework for handling high-dimensional datasets. The technique views the dataset as a joint probability distribution over row and column variables. Our approach starts by initially clustering rows in a dataset, where each cluster is characterized by a different probability distribution. Subsequently, the conditional distribution of attribute...

متن کامل

Scalable Ensemble Information-Theoretic Co-clustering for Massive Data

Co-clustering is effective for simultaneously clustering rows and columns of a data matrix. Yet different coclustering models usually produce very distinct results. In this paper, we propose a scalable algorithm to co-cluster massive, sparse and high dimensional data and combine individual clustering results to produce a better final result. Our algorithm is particularly suitable for distribute...

متن کامل

Parameter-Free Hierarchical Co-clustering by n-Ary Splits

Clustering high-dimensional data is challenging. Classic metrics fail in identifying real similarities between objects. Moreover, the huge number of features makes the cluster interpretation hard. To tackle these problems, several co-clustering approaches have been proposed which try to compute a partition of objects and a partition of features simultaneously. Unfortunately, these approaches id...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2010

Hierarchical Information-theoretic Co-clustering for High Dimensional Data

نویسندگان

چکیده

منابع مشابه

High-Dimensional Unsupervised Active Learning Method

Information Theoretic Clustering of Sparse Co-Occurrence Data

A Hierarchical Probabilistic Model for Co-Clustering High-Dimensional Data

Scalable Ensemble Information-Theoretic Co-clustering for Massive Data

Parameter-Free Hierarchical Co-clustering by n-Ary Splits

عنوان ژورنال:

اشتراک گذاری